Maximum A Posteriori (MAP)

Let x=(x1,,xn)x = (x_1,…,x_n) be i.i.d. realizations from probability mass function pX(t;Θ=θ)p_X(t; \Theta = \theta) (if XX discrete), or from density fX(t;Θ=θ)f_X(t; \Theta = \theta) (if XX continuous), where Θ\Theta is the random variable representing the parameter (or vector of parameters). We define the Maximum A Posteriori (MAP) estimator θ̂MAP\hat \theta_{MAP} of Θ\Theta to be the parameter which maximizes the posterior distribution of Θ\Theta given the data.

θ̂MAP=argmaxθL(𝐱|θ)πΘ(θ)\hat \theta_{MAP} = \arg \max_{\theta} L(\mathbf{x} | \theta) \pi_{\Theta}(\theta)

(same as maximum likelihood, except instead of maximizing likelihood, it is maximizing likelihood multiplied by prior)

where loss function L(𝐱|θ)=i=1nfX(xi|θ)L(\mathbf{x} | \theta) = \prod_{i=1}^n f_X(x_i | \theta) for i.i.d. fXf_X, (compare maximum likelihood estimation)


The estimate of ww, ŵ\hat w, from the noisy observation yy, depending on the observed (noisy) value yy, is also denoted as ŵ(y)\hat w(y). To obtain estimate ŵ\hat w, we will use the maximum a posteriori (MAP) estimator. The MAP estimator is based on the probability density function (pdf) of ww. Specifically, given an observed value yy, the MAP estimator asks what value of ww is most likely?That is, the MAP estimator looks for the value of w where the probability of ww is highest; it looks for the peak value. Therefore, the MAP estimator is defined as ŵ(y)=argmaxwpw|y(w|y)\hat w(y) = \arg \max_w p_{w|y}(w|y) where ‘argmax\arg \max’ is the value of the argument where the function has its maximum. The pdf pw|y(w|y)p_{w|y} (w|y) is the distribution of ww given a specific value yy, where, pw|y(w|y)=pw,v(w,y)py(y)p_{w|y}(w|y) = \frac{p_{w,v}(w,y)}{p_y(y)}

(The MAP estimate ŵ\hat w is the point where the pdf of pw|y(w|y)p_{w|y} (w|y) for some value of yy has its peak)


If

#incomplete

Soft thresholding softmax


References:

  1. https://courses.cs.washington.edu/courses/cse312/22wi/files/student_drive/7.5.pdf
  2. https://en.wikipedia.org/wiki/Maximum_likelihood_estimation
  3. https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation
  4. https://eeweb.engineering.nyu.edu/iselesni/lecture_notes/SoftThresholding.pdf